Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval
نویسندگان
چکیده
The Transliterated Search track has been organized for the third year in FIRE-2015. The track had three subtasks. Subtask I was on language labeling of words in code-mixed text fragments; it was conducted for 8 Indian languages: Bangla, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Telugu, mixed with English. Subtask II was on ad-hoc retrieval of Hindi film lyrics, movie reviews and astrology documents, where both the queries and documents were either in Hindi written in Devanagari or in Roman transliterated form. Subtask III was on transliterated question answering where the documents as well as questions were in Bangla script or Roman transliterated Bangla. A total of 24 runs were submitted by 10 teams, of which 14 runs were for subtask I and 10 runs for subtask II. There were no participation for Subtask III. The overview presents a comprehensive report of the subtasks, datasets, runs submitted and performances.
منابع مشابه
Mixed Script Ad hoc Retrieval using back transliteration and phrase matching through bigram indexing: Shared Task report by BIT, Mesra
This paper describes an approach for Mixed-script Ad hoc retrieval, a subtask as part of FIRE 2015 Shared Task on Mixed Script Information Retrieval. We participated in subtask 2 of the shared task, where a statistical model was used to carry out back transliteration to Devanagari script. To perform the search, bigram based index of the documents were used and search was performed using pivot t...
متن کاملDA-IICT in FIRE 2015 Shared Task on Mixed Script Information Retrieval
This paper aims to describe the methodology followed by Team Watchdogs in their submission for the shared task on Mixed Script Information Retrieval (MSIR) in FIRE 2015. I participated in the subtask 1 (Query Word Labelling) and 2 (Mixed-script Ad hoc retrieval). For subtask 1, Machine Learning approach using CRF classifier was used to classify the tokens as one of the possible languages using ...
متن کاملOverview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016
The shared task on Mixed Script Information Retrieval (MSIR) was organized for the fourth year in FIRE-2016. The track had two subtasks. Subtask-1 was on question classification where questions were in code mixed Bengali-English and Bengali was written in transliterated Roman script. Subtask-2 was on ad-hoc retrieval of Hindi film song lyrics, movie reviews and astrology documents, where both t...
متن کاملAdaptive Voting in Multiple Classifier Systems for Word Level Language Identification
In social media communication, code switching has become quite a common phenomenon especially for multilingual speakers. Automatic language identification becomes both a necessary and challenging task in such an environment. In this work, we describe a CRF based system with voting approach for code-mixed query word labeling at word-level as part of our participation in the shared task on Mixed ...
متن کاملAmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text
The progression of social media contents, similar like Twitter and Facebook messages and blog post, has created, many new opportunities for language technology. The user generated contents such as tweets and blogs in most of the languages are written using Roman script due to distinct social culture and technology. Some of them using own language script and mixed script. The primary challenges ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015